Achieving clinically optimal balance between accuracy and simplicity of a formula for manual use: Development of a simple formula for estimating liver graft weight with donor anthropometrics

In developing a formula for manual use in clinical settings, simplicity is as important as accuracy. Whole-liver (WL) mass is often estimated using demographic and anthropometric information to calculate the standard liver volume or recommended graft volume in liver transplantation. Multiple formulas for estimating WL mass have been reported, including those with multiple independent variables. However, it is unknown whether multivariable models lead to clinically meaningful improvements in accuracy over univariable models. Our goal was to quantitatively define clinically meaningful improvements in accuracy, which justifies an additional independent variable, and to identify an estimation formula for WL graft weight that best balances accuracy and simplicity given the criterion. From the Japanese Liver Transplantation Society registry, which contains data on all liver transplant cases in Japan, 129 WL donor-graft pairs were extracted. Among the candidate models, those with the smallest cross-validation (CV) root-mean-square error (RMSE) were selected, penalizing model complexity by requiring more complex models to yield a ≥5% decrease in CV RMSE. The winning model by voting with random subsets was fitted to the entire dataset to obtain the final formula. External validity was assessed using CV. A simple univariable linear regression formula using body weight (BW) was obtained as follows: WL graft weight [g] = 14.8 × BW [kg] + 439.2. The CV RMSE (g) and coefficient of determination (R2) were 195.2 and 0.548, respectively. In summary, in the development of a simple formula for manually estimating WL weight using demographic and anthropometric variables, a clinically acceptable trade-off between accuracy and simplicity was quantitatively defined, and the best model was selected using this criterion. A univariable linear model using BW achieved a clinically optimal balance between simplicity and accuracy, while one using body surface area performed similarly.

a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 between accuracy and simplicity was quantitatively defined, and the best model was selected using this criterion. A univariable linear model using BW achieved a clinically optimal balance between simplicity and accuracy, while one using body surface area performed similarly.

Importance of simplicity in developing formulas for manual use
Despite recent advances in machine learning, there are still some clinical areas in which complex algorithms fail to provide clinically meaningful gain of accuracy compared to relatively simple formulas because of the limited availability of data and high variability of the subject, leaving traditional simple estimation/prediction formulas yet to be replaced. When developing a formula in such areas, in addition to predictive accuracy, simplicity and ease of use are major concerns because they are often used manually, and the complexity of the model might limit its use or lead to errors.

Improving and assessing the external validity of estimation models
For an estimation formula to be of practical value, it is important to ensure that such a formula provides reasonably accurate estimations for unseen samples from a defined population. In other words, external validity should be maximized and confirmed.
Overfitting is a term that describes one of the potential issues in selecting a more complex model. It occurs when a more complex model fails to perform better in an unseen sample from a defined population compared to a simpler model, whereas it performs better in a training sample from the same population. There are established techniques for improving external validity and avoiding overfitting, including cross-validation (CV) and bootstrapping, and these are also used to assess the model's external validity.

Beyond avoiding overfitting: Seeking practically optimal balance between simplicity and accuracy
Even in the absence of overfitting, the higher complexity of a formula may only lead to marginal rather than practically meaningful improvements in accuracy. In this study, this situation is referred to as superfitting.
While there are some widely used statistical measures and methods for balancing the complexity and accuracy of estimation models, including adjusted R 2 , they do not reflect domain knowledge or help avoid superfitting. A mathematical criterion for avoiding superfitting needs to reflect its context and purpose. In the simplest form, this involves three steps: (1) defining the measure of model complexity, (2) defining the measure of model accuracy, and (3) defining acceptable trade-offs between these measures. To the best of our knowledge, no such mathematical criterion has been reported in clinical research or any other areas.

Model accuracy measure for avoiding superfitting: RMSE
In applied studies including clinical medicine, the coefficient of determination (R 2 ), defined as ratio of the variance explained by the model to the total variance, is often used as the primary measure of accuracy. While R 2 enables a comparison of accuracy between models whose dependent variables have different variances and scales (such as grams and milliliters, each representing weight and volume), it does not convey a difference in accuracy on a clinically meaningful scale. This makes R 2 unsuitable when an optimal balance between the accuracy and simplicity of a model is sought.
The root-mean-square error (RMSE) is, as its name describes, a summary measure of residuals, or "error," in the dimension of the measurements, not as a ratio. Because of this, RMSE allows clinicians to judge the pragmatic significance of accuracy differences between models. Using RMSE for assessing model accuracy enables an objective definition of a clinically acceptable trade-off between the simplicity and accuracy of an estimation formula.

Estimation of WL graft mass using demographic and anthropometric variables
Estimation of WL mass using demographic and anthropometric information is important in liver transplantation, as it is used to calculate the standard liver volume (SLV) and the recommended graft mass for a recipient. Several such estimation formulas have been developed using parameters such as body weight (BW), body surface area (BSA), body height (BH), age, and sex (Table 1) [1][2][3][4][5][6][7][8][9][10][11][12][13]. However, it is not established which independent variable or combination of variables serves best for this purpose.
Some of the previously reported estimation formulas for WL graft mass have multiple independent variables, and some have even more complex structures than simple linear formulas (Table 1). However, cross-validation was not employed for model selection in any of the previous reports. Also, most of the previous reports on multivariable and complex-structured estimation formulas did not even compare the performance of their proposed models against simpler alternatives. This suggests that the previously reported formulas with multiple independent variables and those with relatively complex structures might suffer from overfitting. Such overfitting, by definition, might manifest even in samples similar to their training samples, for example, one in the same country with a similar "racial" composition.
Also, the concern for superfitting was not addressed in any of the previous reports, all of which used the R 2 as the primary measure of accuracy. While some of them claimed superior accuracy of the proposed multivariable and complex formulas over previously reported simpler ones, the nature of R 2 indicates that, even in the absence of overfitting, such formulas might not yield clinically meaningful improvements in accuracy over simpler alternatives, that is, they might be superfitting.
In this study, we aimed to develop a formula for manually estimating the weight of WL grafts from adult donors using donor anthropometric and demographic variables. To achieve a clinically optimal balance between accuracy and simplicity of the model, that is, to avoid overfitting and superfitting, an objectively defined criterion based on RMSE was implemented in the model selection process.

Japanese liver transplantation society database
The study protocol was approved by the project committee of the Japanese Liver Transplantation Society (JLTS) and the Institutional Review Board of Osaka General Medical Center in accordance with the Declaration of Helsinki.
We used the Japanese Liver Transplantation Society (JLTS) database for information on liver transplant donors and grafts. The JLTS database is a registry of all liver transplant cases in Japan, including both living and deceased donors, operated by JLTS since 2012. The registry includes donor information such as type (living or deceased), age, sex, BW, BH, ABO and Rh blood types, and graft weight. The minimal data set for conclusion are within S1 Data.

Analysis cohorts and variables
The process of creating the analysis cohort is illustrated in Fig 1. We used only data of while liver graft from deceased donor for the current study. q .

Selecting a model through "inner" CV
The weight of the liver grafts was used as the dependent variable for each linear regression model using any of the above candidate independent variables or their combinations (Fig 2). The root-mean-square error (RMSE) of the candidate models was averaged through "inner" CV (number of folds = 10, number of repetitions = 10), and was used as the criterion for model selection. More complex models (bivariate models) were required to yield a mean CV RMSE smaller than their simpler alternatives (univariable models) by �5% ("5% RMSE rule"). A "vote" was given to the selected model. Total of 100 "votes" were collected, each of which represented results of a cycle of "inner" CV with a different subset. The candidate model with the largest number of "votes" was selected as the final model. The final model was fitted to the entire data to obtain the intercept and coefficient of the formula.

CV for model evaluation ("outer" CV)
"Outer" CV (number of folds = 10 and number of repetitions = 10) was employed to estimate the external validity of the final model. In addition to RMSE and R 2 , the frequency of each model selected through this "outer" CV was presented as a description of sampling-related variance of model selection. The combination of "inner" and "outer" CV (nested CV) is schematically described in Fig 3. The "outer" cross-validation: The "outer" cross-validation is for estimating accuracy of the final model on unseen sample from the same population. As illustrated above, the entire model selection and fitting process, including the "inner" cross-validation, was repeated with

PLOS ONE
Balancing accuracy and simplicity of a formula for manual use: Estimating liver weight different subsets of the data (training data), and their accuracy was evaluated using the spare data (test data). The model's accuracy measures are not biased when the total sample size is large enough, and the difference in sample size between the entire dataset and training dataset in the "outer" cross-validation is not critical. Distribution of continuous variables was reported as medians with interquartile ranges, and that of categorical variables was expressed as numbers and prevalence rates. R software version 3.6.2 (R Foundation for Statistical Computing, Vienna, Austria, 2019) was used for statistical analysis.

PLOS ONE
Balancing accuracy and simplicity of a formula for manual use: Estimating liver weight

Donor and graft characteristics
Among the liver transplant cases in the JLTS database between 2012 and 2016 (n = 2,181), 129 pairs of WL donors and grafts were included in the analysis (Fig 1). The characteristics of the study cohort are summarized in Table 2.

PLOS ONE
Balancing accuracy and simplicity of a formula for manual use: Estimating liver weight

Performance of the candidate models
The distribution of the RMSE of the candidate models measured in the "inner" CV is illustrated in Fig 2. Similarly, the distribution of R 2 is illustrated in S2 Fig. Models with multiple independent variables received no "votes" based on the "5% RMSE rule," and models using BW, Du Bois and Du Bois' BSA, and Mosteller's BSA received "votes." (The distribution of "votes" is not shown).

Fig 3. Schematic description of nested cross-validation.
The "inner" cross-validation: The "inner" cross-validation is for model selection based on their accuracy with unseen data. Here, the models are repeatedly fitted to different random subsets (training data), and their accuracy is evaluated with the data not used for fitting (test data). The model's accuracy with the test data was averaged through iterations and used for model selection.
https://doi.org/10.1371/journal.pone.0280569.g003 Table 3 summarizes the final fitted formulas, its in-sample fit measures, and the results of CV for model performance evaluation ("outer" CV). In-sample fit measures represent the apparent accuracy of the model on the data used to select and fit them. These do not represent the accuracy of the model on unseen data and are presented primarily for comparison with previous studies. The univariable model using BW was finally selected, whereas the univariable model using Mosteller's BSA was selected in 5% of iterations in the "outer" CV.

Final formula and measures of its estimated external validity
The final fitted formula, with its CV RMSE and R 2 , is as follows:  coefficient of determination.
In-sample fit measures represent degree of apparent accuracy of the models on the data used for fitting them. These were presented primarily for comparison with previous studies. Cross validation results are each candidate model's frequency of being selected as the best model and its accuracy measures against independent samples in the "outer" cross validation for model evaluation. Linear regression formulae took the following form: (graft weight) = a + b 1 x 1 + The predictive performance of the formula was also confirmed using an actual vs. estimated plot (Fig 4).
Given these results, we concluded that a simple univariable linear formula using donor BW achieves a clinically optimal balance between accuracy and simplicity in estimating WL graft weight among linear formulas using donor demographic and anthropometric variables. The use of two or more variables did not allow a clinically meaningful gain of accuracy, defined as an approximately 5% reduction in RMSE (the "5% RMSE rule"). Table 1 summarizes previously reported estimation formulas for WL graft weight or volume.

PLOS ONE
The fit of these estimation formulas to the current population was examined, while disregarding differences between populations in previous studies (Western vs. Asian, and individuals with no liver diseases or liver donor candidates vs. actual liver donors), scale (weight vs. volume), as well as measurement methods (back table vs. autopsy vs. CT). It was assumed that the measured volume (mL) was equal to the weight (g).
The fit of these formulas to the current population is summarized with R 2 (Table 1 and Fig  5), RMSE (Table 1 and S1 Fig), and actual vs. estimated plots (Fig 6).
The previously reported formulas for estimating liver mass showed variable accuracy, as represented by R 2 and RMSE, in the current cohort.

Summary of findings
To the best of our knowledge, this is the first study to quantitatively define the clinically optimal balance between accuracy and simplicity of an estimation formula for manual use, and to empirically select a formula that best meets such a criterion. The context was the estimation of WL graft weight from adult donors using demographic and anthropometric variables. To balance accuracy and simplicity, it was assumed that incorporating two or more variables should be justified with an approximately �5% decrease in RMSE (the "5% RMSE rule"). The conclusion was the selection of a univariable linear formula using BW.

Incorporating clinical knowledge in model development
This is the first study in which an estimation formula for WL graft mass was developed using RMSE, a measure of model accuracy in a clinically meaningful unit (gram), combined with a criterion that reflects clinical knowledge on the optimal balance between the accuracy and complexity of estimation formulas. It was shown that balancing model complexity and accuracy based on a clinically defined criterion (the "5% RMSE rule") selects a simple univariable formula. This is aligned with the apparent lack of correlation between the number of independent variables of previously reported formulas and their RMSE or R 2 in the current population ( Fig 5 and S1 Fig).
In this study, the number of independent variables was used as a measure of model complexity, RMSE was used as a measure of model performance, and an acceptable trade-off was defined as a 5% reduction in RMSE for one additional independent variable. In the language of machine learning, the cost function to be minimized was defined as C = R (1 + 0.05(N-1)), where C represents the cost function, R represents RMSE(g), and N represents the number of independent variables. Despite potential disagreement on the clinical criterion that warrants exploration of other clinically viable criteria, incorporating clinical knowledge in the model development process represents an important principle that could improve the pragmatic implications of a broad range of research involving the development of prediction/estimation formulas for manual use.

Review of previously reported formulas for estimating liver mass
See S1 Text. Review of previously reported formulas for estimating liver mass.

Variable selection in liver mass estimation
See S2 Text. Variable selection in liver mass estimation. The y-axis represents the coefficient of determination (R 2 ) of each previously reported prediction model measured in the current whole-liver (WL) cohort. The x-axis represents the reported in-sample R 2 of previously reported prediction models, i.e., R 2 measured against the dataset with which the model was selected and fitted. The blue purple horizontal line represents cross-validation R 2 of the current model. The blue purple vertical line, with a band, represents the in-sample R 2 of the current model, i.e., R 2 of the current model measured against the current dataset, with its bootstrap 95% range. The dark yellow dashed horizontal line represents the R 2 from the prediction model of DeLand and North in the current dataset (these authors did not report in-sample R 2 ). The color of the dots represents the number of independent variables used in each prediction model.

PLOS ONE
Balancing accuracy and simplicity of a formula for manual use: Estimating liver weight

Limitations
It should be noted that, even when the goal was to estimate liver graft weight in a similar population as in this study, a more complex model could be selected if a substantially larger sample was used for analysis. It is also worth noting that the definition of the optimal balance between simplicity and accuracy, reflected in the cost function, involves not only clinical but also subjective judgment, and it is worthwhile to explore other potential criteria. Regarding the applicability of the formula developed here to different populations, similar miscalibration as observed with the previously reported univariable linear formulas in the current dataset, as summarized in by Table 1, Figs 5, 6, and S1 Fig, is anticipated. Thus, developing an estimation formula optimized in a specific population may still be justified in future, unless a "universal" formula is developed and proven, which will require a far more diverse samples. In absence of such a "universal" formula, however, the authors believe the current study will provide guidance on model selection for developing a formula in a specific population using similar volume of data.

Conclusion
When a formula for manual use in a clinical setting is selected and fitted, a clinically optimal balance between accuracy and simplicity can be achieved using an objective criterion. Using such a criterion, in WL graft weight estimation using demographic and anthropometric variables, a univariable linear formula using BW was found to be optimal, balancing simplicity and accuracy.